Utilizing Domain Knowledge in End-to-End Audio Processing
نویسندگان
چکیده
End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonlyused log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.
منابع مشابه
Structuring and Querying Personalized Audio Using Ontologies1 Ph.D. Thesis Proposal
User-customized information selection and delivery reduces the complexity of the overwhelming amount of information available to end-users. Our approach employs user profiles, data selection, and presentation facilities to deliver customized audio information to end-users. Specifically, we construct a domain-dependent ontology (a collection of key concepts and their inter-relationships) to enab...
متن کاملEnd-to-end learning for music audio tagging at scale
The lack of data tends to limit the outcomes of deep learning research – specially, when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study we make use of musical labels annotated for 1.2 million tracks. This large amount of data allows us to unrestrictedly explore different front-end paradigms: from assumption-free models – using waveforms as input wit...
متن کاملMetadata Tools for Digital Motion Picture Archives
Most of the video information retrieval systems today rely on some set of computationally extracted video and/or audio features, which may be complemented with manually created annotation that is usually either arduous to create or insufficient for capturing the content. This thesis looks at the specific domain of motion pictures to identify the computational features relevant to films and, mor...
متن کاملCipher text only attack on speech time scrambling systems using correction of audio spectrogram
Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...
متن کاملSoftware-Based Video/Audio Processing for Cellular Phones
Nowadays, most cellular phones are used beyond voice communication. Although the processing power of cellular phones is sufficient for most data applications, it is difficult to play video and audio contents in software because of their computational complexity and lack of basic tools for multimedia processing, so software-based multimedia processing on cellular phones is a challenging issue. S...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1712.00254 شماره
صفحات -
تاریخ انتشار 2017